
Over the last 5 years “micromobility” companies, defined as companies that operate small, lightweight vehicles such as scooters and bikes that can be rented for short periods of time, have proliferated. The City of Austin operates its own such company, MetroBike, which operates bike stations in dozens of locations around the city. Austin’s enjoyable climate and extensive array of activities makes the city’s shareable bikes popular among local commuters and tourists alike. However, Austin has a very diverse population, that is segregated in many ways to specific areas of town. Moreover, crime is more common in some of these areas than others. Our question is, do these factors, and others, affect the ridership numbers of MetroBike? Do certain stations require more bicycles than others?
In this project, our team seeks to quantify the relationship between crime and ridership for a given station and looks to build upon this by identifying other useful features which may impact the number of rides that origination from any one station. If we are able to identify said features, and those features have a statistically significant impact on ridership at a given station, we hope that our research could help MetroBike better service their stations and ensure that enough bikes are available at a given time. Not only does this lead to a better experience for patrons of the company, but it should also allow the company to operate more efficiently.
As stated, the problem we are addressing is the potentially inefficient management of MetroBike’s ridesharing stations around the city of Austin. This problem was identified through the inspection of several stations around the city where MetroBike’s shareable bicycles sat idle. As each bike is rented for a very short period of time, the use of said bikes represents revenue for the company and in effect the city. Idle bikes represent lost revenue.
But why do these bikes sit idle? Are there characteristics of the surrounding area that prevent commuters and tourists from using the bikes? Is it due to the crime in the area? How much does weather play a role in the rental of the bikes in that area? In other words, are the bikes used to get to work, in which case a cloudy, overcast day is unlikely to play a role in the popularity of a station, or are they used primarily for recreation? Is there a festival, a football game, or some other popular event occurring on that day and in that area?
Each of these questions represents a potential feature that can be added to our data and eventually modeled to understand and hopefully predict the frequency of rides originating at a specific MetroBike station. The importance of this is clear from the initial exploration of the problem. If we can understand the relationship between each of these features and MetroBike’s ridership, we can hopefully predict the ridership on a given day at a given station. If we can successfully do this, MetroBike can expend its resources more efficiently, stocking up certain stations that are likely to be busy, thereby pulling in more revenue than they would if they left bikes in unused locations. This is a common exercise for businesses who rely on the efficient expenditure of resources to facilitate their customers behavior, simultaneously saving money, while potentially bringing in more revenue.

After joining the MetroBike datasets, we are looking at three possible pieces of intelligence we can derive from our data. Each of these insights will be found using a regression analysis of our data. To do this, we will join the Bike share, Weather and Crime tables on date.
From this master database, we will seek to run a regression with the goal of determining the most popular stations from which users are likely to initiate a bike share. The station from which a ride was initiated will be our dependent variable and the features such as weather and crime on a given day will be our independent variables.
Clearly, the number of features will need to be whittled down to reduce dimensionality and prevent overfitting. Our reduction of features will likely be done using a mixture of manual reduction of features (i.e. removing features we do not think will play a role in our analysis, or may be redundant across datasets), as well as with lasso feature selection. Prior to conducting the lasso feature selection, we will initialize weights using the sum of squares method. Once we have initialized our weights, we will then penalize the model’s number of features using lasso, hopefully to the point where certain features which have been deemed unimportant will be dropped from our model.
With our dataset in place, and our features narrowed down, we will seek to look at how certain changes in crime and weather impact the likelihood of a rider choosing to redeem a bike share at a certain location. Does an increase in crime in a certain location reduce the likelihood of a rider using bike share? Does the weather play an important role in that location’s popularity (i.e. are there riders who need rides in that location regardless of the weather or is it a location that is only utilized on good weather days)?
Finally, can our model predict ridership? If so, MetroBike will know when and how completely to replenish their stations with bicycles for the upcoming day's rides. It can also indicate areas that would be prime candidates for new stations given the amount of demand in those locations.
The datasets available for our problem are all specific to the City of Austin. We used four datasets total:
The largest set contains 18 years of public crime reports from the Austin Police Department. This is a public dataset, available on the data.texas.gov website. It is updated on a weekly basis and contains a record of incidents that the Austin Police Department responded to and filed reports on. Though this data is available from 2002 onward, we will be primarily reviewing data from 2013 through 2017, coinciding with the available data from our most limited dataset.
Adjust the slider to pinpoint crimes in a certain month
We will be joining the Crime dataset with publicly available weather data, obtained from Weather Underground, and collected from the Austin KATT Station (located at Camp Mabry in West-Central Austin). This data primarily includes historical temperature, precipitation, humidity, and wind speed for the City of Austin and local suburbs.
The data needs some cleaning, as you’d expect. The file has several quirks we need to clean up:
Some precipitation values are listed as ‘T’, meaning trace. This is a very small amount of rain, but not enough to register a value in inches. We replaced these with a value of 0.01 inches, so the entire column is numeric.
The events column is a string with a concatenation of ‘Fog’, ‘Rain’, 'Snow', and ‘Thunderstorm’, depending on which conditions were present. Here we created three new boolean columns which are set to True if there was Fog, Rain, Snow or Thunderstorms on that day.
Finally, we will be measuring our publicly available data against information on nearly 650,000 Bike Share Trips within the city, made available by the City of Austin for trips from 2013 through 2017. The original dataset is available from Google Public Data. Bike shares are a service offered by a handful of providers (the most prolific in Austin being Austin B-Cycle) and are becoming a popular alternative means of transportation.
This data includes information on bike trip start location, stop location, duration and type of bike share user. An additional dataset includes bike station location data by latitude and longitude, as well as location operating status. We joined these tables on the individual station ID in order to capture latitude and longitude data for each ride.
Hover over a station to see it's location:
Detrending
A major concern when working with time series data is controlling for non-stationary data. Because our goal is to accurately model the evolution of a time series (in the case bike ridership) with respect to other observable features, we need to remove any change in the mean of the data over time. Detrending bike ridership will allow us to more accurately identify subtrends in the time series and more accurately identify which factors influence ridership the greatest.
To detrend the data, we fit a linear regression to the data and recorded the constant and slope of the line as well as their respective t-statistics. Because the t-statistic for ridership growth was significantly different from 0 (t-stat = 6.392) with a value of 0.1824. This means that on average ridership increased per day by 0.1824 rides. Since we determined there was a significant trend in ridership we proceeded to detrend the data by subtracting out the cumulative trend from every data point. The results are shown below:
For example, on 01/01/2014, we quantify crime affecting metro bikes with the following features:
Sum crime within 50m, 100m, 250m, 500m of any given station in the past 1 day, 3 days, 1 week, 1 month.
The map below displays the stations with a 500m radius drawn them. Any crime occuring within this circle would be counted as effecting Metro Bikes for the relevent crime features.
Below are the frequency of rides per day.
.png)

Chosen models and why training methods
Examples - J&J to confirm
Base model: Before starting any of our machine learning models, we will establish a baseline. This gives us a benchmark of performance to compare our model against.
Time-based regression: We’ll use a linear model to predict the rentals based on the month of the year, day of the month, day of the week, and hour of day.
Daily weather and time-based regression: We’ll take the time-based model, and combine this with the daily weather conditions. We hypothesize that the number of rentals will change alongisde weather conditions.
Feature selection: Once we have all our features, we’ll verify which ones are most impactful to our model.
Hyperparameter tuning: We’ll look at both Ridge and Lasso models, which use regularization to help us generalize our model to new data.
Above, we see that there is some deviation from a perfect regression. On average, we are off by 213 rides per day in the testing set. In the training set, we can see that the error is much lower at 166 rides. This leads to the suspicion that we may have overfitted the training data. Later, attempts are made to create a more sparse model with fewer features
It is worth showing the impact on model output for the features at this baseline model stage. As fewer features are dropped out, the relevant features will change. Currently, the categorical feature dictating whether or not the ride is in March has tremendous model impact.
Lasso regrssion resulted in a RMSE for training set of 201 and the testing set of 186. At an alpha of 1.0344 Lasso eliminated all but 8 of the features. These features, their feature type, and weight (ranked in importance) were ACL (Bool) [336.70], South by South West (Bool) [266.15], Week 11 (Bool) [210.73], Saturday (Bool) [194.49], Fiday (Bool) [48.00], March (Bool) [18.59], Average Rides 1-Day Lag (numeric) [0.59], crime 1-Day Lag 500m radius (numeric) [0.22].
Ridge regrssion resulted in a RMSE for training set of 187.50 and the testing set of 197.92. At an alpha of 45.0 Ridge ranked all the features in importance. The top 8 of these features, their feature type, and weight (ranked in importance) were Saturday (Bool) [199.06], Week 11 (Bool) [73.96], South by South West (Bool) [62.53], is March (Bool) [62.22], ACL (Bool) [47.88], whether a Weather Event Occured (Bool) [34.43], is Friday (Bool) [34.00], is week 40 (numeric) [26.00].
Using a neural network was attempted using all features, the top features from lasso regressions, and a wide variety of hyper parameters including batch sizes, number of epochs, learning rates, decay rates, quantity of layers, nodes on each layer, and including dropout layers. The iteration of models was conducted in a methodical manner. Models were tweaked into progressively better models through the process of changing the hyper parameters.

This image above shows the training and testing loss throughout epochs for the training of the models.

This image above shows the testing squared error throughout the epochs. There were a couple of model hyperparameters that resulted in lower squared errors, effectively getting the model out of local minima. The most significant tweaks were the learning rates.

The best neural network had a mean squared error, on the test set, of 182 bikes. On the training set, the mean squared error was 174. These results were significantly worse than simple lasso or ridge regressions.
The neural network was retrained on just the non-zero features provided by the lasso regression, and this did not lead to any further improvement.
After attempting multiple linear regression across all features, we established a baseline model that had, on average, around 213 unexplained trips per day. When observing the on average number of bike trips per day of 538, our loss is adequate for the complex problem that this could be. Most of our loss is atributed to the abnormally high ridership days. We expected that the features we gathered, such as whether or not the day had an event, to account for these days. It seems that our models were meet these expectations.
After detrending data, our model losses significantly improved. This was an aspect of analysis that was worthwile.
Lasso regressions outperformed any other models by not only lowering loss, but defining a sparse model with an optimally-low amount of features. Surprisingly, lasso outperformed neural networks that included interaction between variables and non-linear activations. If there was more analysis to follow, new features could be introduced that would capture the outlier days.
WeatherUnderground - Austin KATT Station
https://www.wunderground.com/?cm_ven=cgi The dataset above originates from an IBM company called Weather Underground. As described above, this specific dataset will include historical weather data for the Austin, Texas area. Coupled with our other crime-based datasets, we may be able to discover insight for the relationship between crime and weather patterns in the Austin area.
MetroBike - Austin B-Cycle Stations
https://austin.bcycle.com/stations The MetroBike dataset includes geolocation data for the metropolitan bike sharing initiative. These data could provide insight to typical traffic patterns and traveling preferences around the city.
https://data.austintexas.gov/Public-Safety/Crime-Reports/fdj4-gpfu/
City of Austin Crime Reports The City of Austin provides a data portal to obtain records of incidents received by the Austin Police Department. This public safety dataset provides current data that is updated weekly.
https://nycdatascience.com/blog/r/using/
Using Linear Regression to Predict Weather Patterns The article above employs linear regression to quantify the trends of weather patterns. Their research concludes that linear regression is a nonlinear system. This understanding will have implications for model selection and pre-training feature modifications.